-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45190][SPARK-48897][PYTHON][CONNECT] Make from_xml
support StructType schema
#47355
Conversation
also cc @sandip-db |
9dc2bad
to
9ecd461
Compare
@@ -16303,7 +16303,21 @@ def from_xml( | |||
>>> df.select(sf.from_xml(df.value, schema).alias("xml")).collect() | |||
[Row(xml=Row(a=1))] | |||
|
|||
Example 2: Parsing XML with :class:`ArrayType` in schema | |||
Example 2: Parsing XML with a :class:`StructType` schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this. Can you please reuse the existing jira #SPARK-45190?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I was not aware of that ticket, will also add it to the title
@@ -16303,7 +16303,21 @@ def from_xml( | |||
>>> df.select(sf.from_xml(df.value, schema).alias("xml")).collect() | |||
[Row(xml=Row(a=1))] | |||
|
|||
Example 2: Parsing XML with :class:`ArrayType` in schema | |||
Example 2: Parsing XML with a :class:`StructType` schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you please enable tests here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
from_xml
support StructType schemafrom_xml
support StructType schema
Merged to master. |
…tructType schema ### What changes were proposed in this pull request? Make `from_xml` support StructType schema ### Why are the changes needed? StructType schema was supported in Spark Classic, but not in Spark Connect to address apache#43680 (comment) ### Does this PR introduce _any_ user-facing change? before: ``` from pyspark.sql.types import StructType, LongType import pyspark.sql.functions as sf data = [(1, '''<p><a>1</a></p>''')] df = spark.createDataFrame(data, ("key", "value")) schema = StructType().add("a", LongType()) df.select(sf.from_xml(df.value, schema)).show() --------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) Cell In[1], line 7 ... AnalysisException: [PARSE_SYNTAX_ERROR] Syntax error at or near '{'. SQLSTATE: 42601 JVM stacktrace: org.apache.spark.sql.AnalysisException at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:278) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:98) at org.apache.spark.sql.catalyst.parser.AbstractParser.parseDataType(parsers.scala:40) at org.apache.spark.sql.types.DataType$.$anonfun$fromDDL$1(DataType.scala:126) at org.apache.spark.sql.types.DataType$.parseTypeWithFallback(DataType.scala:145) at org.apache.spark.sql.types.DataType$.fromDDL(DataType.scala:127) ``` after: ``` +---------------+ |from_xml(value)| +---------------+ | {1}| +---------------+ ``` ### How was this patch tested? added doctest and enabled unit tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#47355 from zhengruifeng/from_xml_struct. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…tructType schema ### What changes were proposed in this pull request? Make `from_xml` support StructType schema ### Why are the changes needed? StructType schema was supported in Spark Classic, but not in Spark Connect to address apache#43680 (comment) ### Does this PR introduce _any_ user-facing change? before: ``` from pyspark.sql.types import StructType, LongType import pyspark.sql.functions as sf data = [(1, '''<p><a>1</a></p>''')] df = spark.createDataFrame(data, ("key", "value")) schema = StructType().add("a", LongType()) df.select(sf.from_xml(df.value, schema)).show() --------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) Cell In[1], line 7 ... AnalysisException: [PARSE_SYNTAX_ERROR] Syntax error at or near '{'. SQLSTATE: 42601 JVM stacktrace: org.apache.spark.sql.AnalysisException at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:278) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:98) at org.apache.spark.sql.catalyst.parser.AbstractParser.parseDataType(parsers.scala:40) at org.apache.spark.sql.types.DataType$.$anonfun$fromDDL$1(DataType.scala:126) at org.apache.spark.sql.types.DataType$.parseTypeWithFallback(DataType.scala:145) at org.apache.spark.sql.types.DataType$.fromDDL(DataType.scala:127) ``` after: ``` +---------------+ |from_xml(value)| +---------------+ | {1}| +---------------+ ``` ### How was this patch tested? added doctest and enabled unit tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#47355 from zhengruifeng/from_xml_struct. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Make
from_xml
support StructType schemaWhy are the changes needed?
StructType schema was supported in Spark Classic, but not in Spark Connect
to address #43680 (comment)
Does this PR introduce any user-facing change?
before:
after:
How was this patch tested?
added doctest and enabled unit tests
Was this patch authored or co-authored using generative AI tooling?
no